Search CORE

364 research outputs found

Reinforcement Learning: A Survey

Author: Kaelbling L. P.
Littman M. L.
Moore A. W.
Publication venue
Publication date: 01/01/1996
Field of study

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Visualizing Convolutional Networks for MRI-based Diagnosis of Alzheimer's Disease

Author: G Litjens
G Montavon
GB Frisoni
L. P. Kaelbling
MD Zeiler
Y Mu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/08/2018
Field of study

Visualizing and interpreting convolutional neural networks (CNNs) is an important task to increase trust in automatic medical decision making systems. In this study, we train a 3D CNN to detect Alzheimer's disease based on structural MRI scans of the brain. Then, we apply four different gradient-based and occlusion-based visualization methods that explain the network's classification decisions by highlighting relevant areas in the input image. We compare the methods qualitatively and quantitatively. We find that all four methods focus on brain regions known to be involved in Alzheimer's disease, such as inferior and middle temporal gyrus. While the occlusion-based methods focus more on specific regions, the gradient-based methods pick up distributed relevance patterns. Additionally, we find that the distribution of relevance varies across patients, with some having a stronger focus on the temporal lobe, whereas for others more cortical areas are relevant. In summary, we show that applying different visualization methods is important to understand the decisions of a CNN, a step that is crucial to increase clinical impact and trust in computer-based decision support systems.Comment: MLCN 201

arXiv.org e-Print Archive

Crossref

Context-Aware Conversational Agents Using POMDPs and Agenda-Based Simulation

Author: D. Griol
D. Griol
E. Levin
L. Kaelbling
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Proceedings of: Workshop on User-Centric Technologies and Applications (CONTEXTS 2011), Salamanca, April 6-8, 2011Context-aware systems in combination with mobile devices offer new opportunities in the areas of knowledge representation, natural language processing and intelligent information retrieval. Our vision is that natural spoken conversation with these devices can eventually become the preferred mode for managing their services by means of conversational agents. In this paper, we describe the application of POMDPs and agenda-based user simulation to learn optimal dialog policies for the dialog manager in a conversational agent. We have applied this approach to develop a statistical dialog manager for a conversational agent which acts as a voice logbook to collect home monitored data from patients suffering from diabetes.Funded by projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485), and DPS2008-07029-C02-02.Publicad

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Learning Symbolic Models of Stochastic Domains

Author: Kaelbling L. P.
Pasula H. M.
Zettlemoyer L. S.
Publication venue: 'AI Access Foundation'
Publication date: 10/10/2011
Field of study

In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics

arXiv.org e-Print Archive

Crossref

Competition in Social Networks: Emergence of a Scale-free Leadership Structure and Collective Efficiency

Author: B. W. Arthur
D. J. Watts
G. Korniss
Kevin E. Bassler
L. P. Kaelbling
M. Anghel
S. A. Kauffman
Z. Toroczkai
Zoltán Toroczkai
Publication venue: 'American Physical Society (APS)'
Publication date: 30/07/2003
Field of study

Using the minority game as a model for competition dynamics, we investigate the effects of inter-agent communications on the global evolution of the dynamics of a society characterized by competition for limited resources. The agents communicate across a social network with small-world character that forms the static substrate of a second network, the influence network, which is dynamically coupled to the evolution of the game. The influence network is a directed network, defined by the inter-agent communication links on the substrate along which communicated information is acted upon. We show that the influence network spontaneously develops hubs with a broad distribution of in-degrees, defining a robust leadership structure that is scale-free. Furthermore, in realistic parameter ranges, facilitated by information exchange on the network, agents can generate a high degree of cooperation making the collective almost maximally efficient.Comment: 4 pages, 2 postscript figures include

arXiv.org e-Print Archive

Crossref

Self-Modification of Policy and Utility Function in Rational Agents

Author: B Hibbard
D Dewey
D Silver
J Schmidhuber
L Orseau
L Orseau
L Orseau
LP Kaelbling
M Hutter
M Hutter
M Ring
N Bostrom
R Sutton
RV Yampolskiy
S Legg
V Mnih
Publication venue
Publication date: 10/05/2016
Field of study

Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201

arXiv.org e-Print Archive

Crossref

The Australian National University

Regression with Linear Factored Functions

Author: CM Bishop
I-C Yeh
J Gerritsma
JA Nelder
JH Friedman
L Csató
LP Kaelbling
ME Tipping
P Cortez
P Tüfekci
W Böhmer
W Böhmer
Z Wang
Publication venue
Publication date: 30/03/2015
Field of study

Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201

arXiv.org e-Print Archive

Crossref